DH – Data harvester

1 Introduction[//]

Vubis supports an automatic harvester for bibliographic records, using Z39.50 as the harvesting protocol.

The ‘harvester' automates the copy cataloguing workflow which allows you to use the Z39.50 cataloguing profiles to search Z39.50 targets and allows the merge of data into your original cataloguing records.

Below is a description of the steps you need to take before you can use this functionality.

2 Setup[//]

A matching profile needs to be set up in AFO 114 to determine the field(s) to use for determining if records are the same.

A merging profile needs to be set up in AFO 115 to specify what must happen to fields when you merge records that are deemed to be 'the same' based on the matching profile.

Optionally create a conversion profile in AFO 134, in case the format of incoming records is different from the format used in your target database.

A standard import profile is defined in AFO 133 which specifies the Z39.50 input record type, any conversions if required, match and merge criteria, destination database and template, etc. This is linked to the previously defined matching and merging profiles (and optionally to the conversion profile).

This import profile is then specified on the definition for the Z39.50 Database in AFO 651. A Z39.50 target is created for the copy partners, and a cataloguing profile is created to link the original record key to the copy source. See AFO 651 - Z39.50 Parameters - Target Profiles - Cataloguing Profiles for details.

A selection profile is created in AFO 141 to retrieve a list of bibliographic records. There are two fields on the input form for creating a selection profile that are valid only for the bibliographic application: 'Update type' and 'Update profile'.

See the online help of the aforementioned AFO's for more detailed information on each step.

3 Workflow[//]

The selection profile can be executed either against a predefined savelist or may be used to create a new savelist. The system will loop through the resulting savelist and process each bibliographic record as follows:

1.              A Z39.50 search will be started using the distinctive search criteria within the cataloguing profile selected. This could be EAN, ISSN, ISBN or similar distinctive search criterion (key).

2.              The request will generate one of three possible options:

·                no record is returned – a message is written to the report file

·                one record is returned – the bibliographic record is updated using the standard load and update tools

·                more than one record is returned – a message is written to the report file

The error / processing reports can be viewed using the standard report capabilities in AFO 642 in a standard txt format.

Example:

We have a number of skeleton records which are collected and placed into a savelist

We would like to let the system find the full records for us from a selected source Z39.50 target.

1.              Set up a match profile in AFO 114 to specify how to handle updates to the record. In this case we will use the Marc21 020 $a (ISBN) and 022$a (ISSN) index entries as our match criteria:

2.              Create a MERGE profile ruleset in AFO 115 to specify how to perform the record merge. In this case we are going to overwrite the existing record with the incoming record from our source:

The details are:

3.              In AFO 134 create a conversion profile if required, for the mapping of data between the source or incoming record to the destination or record to be filed in the local database.

4.              Create an Import profile in AFO 133. This will define the character set, format of the incoming record(s), conversion profile to be used, savelists to be used during the load process and the links for the match profile, merge profile, destination database information, conversion profile etc. Since the Harvesting protocol currently in use is Z39.50, ensure that the import format has been defined to use the proper ISO2709 format.

5.              In AFO 651 - Z39.50 create a database and database group using the Z39.50 server to be accessed for source or incoming records. On the Data Source definition, specify the Import profile created for the Harvesting (match and merge) as defined in AFO 133.

6.              In AFO 651 - Z39.50 - Target Profile create a target or target group to be used for query.

7.              In AFO 651 - Targets - Cataloguing Profile create or update a cataloguing profile

Select your target profile group for the Z39.50 search group. Access point is the Use Attribute which will be sent to the target Z39.50 search group.

Search key defines the field(s)/Subfield(s) to be used from the source record to identify the data to be sent as a search key to the target host.

Multiple source fields/subfields can be specified as comma delimited. (i.e. enter 020/$a,022/$a to specify a Marc21 ISBN or ISSN field).

8.              In AFO 141 - Selection create a new selection profile

Specify the update profile as the Cataloguing profile you have created in 651 / Targets

Note

You must have at least one criterion defined. In our case we will use "if ISBN or ISSN is defined".

To Process

AFO 141- Create a savelist of the records you have in the current local database. These are the skeleton records which can be overwritten with the incoming data. Alternately, the selection profile could be used to create its own result list. In this example, we have already created a savelist using database 1 and will be executing the selection for the harvest filing records to database 6.

From the savelist, execute the selection. Use the selection profile which has been previously defined for the harvesting or updating of records.

The result is that the skeleton record is either updated or created as a new record (depending upon the profile setup). In this example we used skeleton records from database 1 and filed new records into database 6.

Sample report

The harvester gives a report of what could be enriched, what was not enriched, etc. It also can produce savelists of the enriched/non enriched bib records. These savelists are created in AFO 141. The purpose of this is: the user can check the records (actually see what has been done, otherwise he might never know because it is fully automatic). For the bibliographic records that were not enriched, the savelist can be used later to review manually, or for another « harvesting session » (to another database, or to the same database but 2 months later).

The report shows numbers for various possibilities (loaded, not found etc.) and lists database and record number for problems (e.g. 6.245):

Z39.50 harvester

------------------------------------------------------------------

Records processed : 100

Loaded records : 76

Not found : 22

------------------------------------------------------------------

       6.245

       6.276

       6.315

       6.324

       6.329

       6.338

       6.341

       6.355

       6.361

       6.391

       6.393

       6.396

       6.399

       6.402

       6.404

       6.406

       6.409

       6.416

       6.417

       6.250

       6.360

       6.400

------------------------------------------------------------------

More than one found : 1

6.366

------------------------------------------------------------------

Errors : 1

------------------------------------------------------------------

       6.199 : No key in local record

Or

Z39.50 harvester

------------------------------------------------------------------

Records processed : 4

Loaded records : 4

Not found : 0

More than one found : 0

Errors : 0


·                     Document control - Change History

 

Version

Date

Change description

Author

1.0

November 2009

creation
part of 2.0 updates